test(kad): use virtual time to resolve flaky bootstrap test by 0xsamalt · Pull Request #6361 · libp2p/rust-libp2p

0xsamalt · 2026-03-29T21:22:44Z

Description

This PR fixes intermittent CPU-lag flakiness in Kademlia's bootstrap.rs test suite.

The tests previously measured real-world timeouts using futures_timer::Delay and web_time::Instant, causing them to fail when CI servers inevitably overshot expectations.

By leveraging Tokio's test-util feature and explicitly wrapping our mocks for the tests, #[tokio::test(start_paused = true)] ensures the elapsed test time is completely deterministic and decoupled from wall-clock performance. Tests run instantly and pass 100/100 times in local looping.

cc @jxs @elenaf9

Notes & open questions

This is purely an internal test infrastructure fix. Production paths are untouched because futures_timer::Delay is preserved under #[cfg(not(test))].
Because this does not affect any public API or library behavior, a changelog entry was intentionally omitted.

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation (N/A - Test fix)
I have added tests that prove my fix is effective or that my feature works (Modified existing flaky tests)
A changelog entry has been made in the appropriate crates (N/A - Internal testing change only)

mergify · 2026-03-31T08:46:35Z

This pull request has merge conflicts. Could you please resolve them @0xsamalt? 🙏

elenaf9 · 2026-04-21T08:06:57Z

Thank you for this PR @0xsamalt!

Deterministic timings in the test scenario would be great. Still, I don't think we should switch the Delay implementation between test and non-test configuration. In the end we want to test the code that is run in production, which uses futures_timer::Delay.

I haven't see the flaky kad test in the wild for a while; did you recently ran into issues with it?

0xsamalt · 2026-04-24T19:24:30Z

Hey @elenaf9 ,

That completely makes sense! I agree that testing the exact production dependencies is generally much safer.

To answer your question: yes, I did recently run into this flakiness locally. My CPU had a tiny lag spike during the test runner, so a timer that was supposed to hit the assertion at < 10ms overshot to 12ms and panicked.

I attempted to introduce Tokio's start_paused Virtual Time here under the assumption that Kademlia was trying to move aggressively towards deterministic test execution, but I completely understand wanting to keep futures_timer::Delay intact across all environments!

Since the flakiness only seems to pop up on heavily loaded/laggy local machines, how would you like to proceed?

Option A: I can happily close this PR so we don't add unnecessary mock objects just to appease a rare edge case.
Option B: If you'd like to protect against the lag edge-case without swapping the timer, I can rip out the Virtual Time code in this PR and simply widen the time assertion bounds (e.g. subtracting 10ms for lag allowance) on those specific lines so they don't panic on slow runners.

Let me know what you prefer!

0xsamalt added 3 commits March 29, 2026 21:06

test(kad): use virtual time to resolve flaky bootstrap test

70b64f4

Merge branch 'master' into fix-flaky-bootstrap-test

0d04b1e

Merge branch 'master' into fix-flaky-bootstrap-test

f4ac55b

0xsamalt added 4 commits March 31, 2026 16:56

Merge branch 'master' into fix-flaky-bootstrap-test

e2c1edc

Merge branch 'master' into fix-flaky-bootstrap-test

10a21bf

fix(test): use hybrid mock time and sync lockfile to fix CI

aca6349

Merge branch 'master' into fix-flaky-bootstrap-test

08a2c70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(kad): use virtual time to resolve flaky bootstrap test#6361

test(kad): use virtual time to resolve flaky bootstrap test#6361
0xsamalt wants to merge 7 commits intolibp2p:masterfrom
0xsamalt:fix-flaky-bootstrap-test

0xsamalt commented Mar 29, 2026

Uh oh!

mergify Bot commented Mar 31, 2026

Uh oh!

elenaf9 commented Apr 21, 2026

Uh oh!

0xsamalt commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xsamalt commented Mar 29, 2026

Description

Notes & open questions

Change checklist

Uh oh!

mergify Bot commented Mar 31, 2026

Uh oh!

elenaf9 commented Apr 21, 2026

Uh oh!

0xsamalt commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants